This replication archive contains the code and data needed to reproduce the tables and figures presented in "Registering Returning Citizens to Vote: A Field Experiment in North Carolina" (published in JOP). Because criminal records are sensitive information, we do not provide raw criminal records data with personally-identifying details. Instead, we provide several de-identified datasets for analysis, along with the merge code that was used to combine identifiable datasets to create those de-identified analysis datasets. 

Email Ariel White (arwhi@mit.edu) with questions about these materials. We thank Tomoya Sasaki for his work cleaning up and organizing the code from this project, and Ning Soong for her help testing the replication archive. 

---------------------------
List of scripts included:
---------------------------

1) "replication_preparation.R": this script does most of the merging of identifying datasets together to produce the main (deidentified) datasets used for analysis. Because it relies on names and DOBs for record linkage, it cannot be run with the data included here, but we include it so others can see the record-linkage process used. 

2) "replication_main.R": This script pulls in the de-identified main analysis dataset ("main_paper_2025.csv") produced by "replication_preparation.R" and reproduces the main tables of the paper (Table 1, Table 2) as well as Tables A1 and A10 of the SI. 

3) "replication_dataloss.R": This script pulls in "nc_dataloss_describe.csv" and produces figures A2-A3 of the SI.

4) "replication_otherstudies.R": This script pulls in several datasets created in "replication_preparation.R" to produce various SI figures/tables about other studies beyond the main experiment (comparison group without records, pilot studies, postcard-bounce test). 

5) "replication_samplecomparisons.R": This script produces a series of SI figures for SI section A.5, comparing this study's sample and effect sizes to other published work. 

6) "replication_causalforest.R": This script uses the main deidentified analysis dataset to reproduce the effect-heterogeneity analyses discussed in SI Section A.6 (Figure A5, Table A8). 

---------------------------
List of datasets included:
---------------------------

1) "main_paper_2025.csv": This is the main de-identified analysis dataset for the project, produced by "replication_preparation.R". It contains observations for every person included in the main experiment described in the paper. 

2) "nc_dataloss_describe.csv": This is a supplemental de-identified analysis dataset, also produced by "replication_preparation.R". It is used for Figures A2 and A3 in the SI. 

3/4) "comparison_party_treatment.csv" and "NC_party_treatment.csv" are produced in "replication_preparation.R" and used to create Tables A2 and A8 in the SI. 

5) "tau_forests_use.rds" contains the output from the ML approach presented in SI section A6. 

6) "NC_postcards_deid.csv" contains the deidentified data from our postcard-bounce followup study described in SI section A7.

7) "NCpilot_deid.csv" contains the deidentified data from our pilot studies described in SI section A10. 
